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Strong foreground contamination in high resolution CMB data requires masking which introduces 
statistical anisotropies and renders a full maximum likelihood analysis numerically intractable. 
Standard analysis methods like the pseudo-C; framework lead to information loss due to estimator 
suboptimalities. We set out and validate a methodology for numerically efficient estimators for 
a masked sky that recover nearly as much information as a full maximum likelihood procedure. 
In addition to the standard pseudo-C; observables, the approach introduces an augmented basis 
designed to account for the mode coupling due to the masking of the sky. We motivate the choice 
of this basis by describing the basic structure of the covariance matrix. We demonstrate that 
the augmented estimator can achieve near-optimal results in the presence of a WMAP-realistic mask. 
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I. INTRODUCTION 

The success of high resolution CMB experiments like 
WMAP has opened up a wealth of new possibilities to 
test cosmological models of structure formation. The 
large amount of data has been used to place strong con- 
straints on parameters of statistical models for the CMB 
temperature fluctuations. Developing accurate and nu- 
merically feasible estimators that can optimally extract 
cosmological information from the data is a central prob- 
lem in CMB analysis. 

Let us denote the deviation from the mean CMB tem- 
perature observed in direction h by AT(n). To a very 
good approximation AT(h) defines a Gaussian random 
field on the sphere with zero mean. We will assume for 
now that we can neglect any potential sources of non- 
Gaussianities, so the random field is entirely determined 
by its 2-point function. It is convenient to treat the prob- 
lem in multipole space and define multipole coefficients 
of the temperature fluctuations via 



aim = / dnY lm (h)AT(h) 



(1) 



Assuming the universe to be statistically isotropic, we 
also expect the random field AT(n) to be statistically 
isotropic. It can be shown that this implies 



I1I2 ^mi? 



(2) 



where C/ is the power spectrum of temperature fluctu- 
ations. Hence the statistics of the temperature fluctu- 
ations are entirely determined by their power spectrum 
and cosmological parameters influence the fluctuations 
solely through the power spectrum. 

A common technique to extract information from the 
CMB is to estimate parameters of the statistical model 
using Maximum Likelihood (ML) estimation which will 
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be discussed in more detail below. ML estimation for 
Gaussian distributions generically requires in some way 
the inversion of the covariance matrix. The discussion 
above suggests that this is not a problem as the covari- 
ance structure of the multipole coefficients seems to be 
extremely simple. It is merely a diagonal matrix that 
can be trivially inverted. Unfortunately the true mul- 
tipole coefficients in the above discussion are not acces- 
sible in practice and the covariance structure of experi- 
mental multipole coefficients is much more complicated. 
This is due to two reasons. First, anisotropic instrumen- 
tal noise contributes to the measured multipole coeffi- 
cient and leads to correlations between them. Secondly, 
foreground contamination due to radiation sources both 
within our on galaxy and beyond make it necessary to 
mask the sky with certain templates. The spherical har- 
monics are not orthonormal on an incomplete sky and 
hence the introduction of the mask leads to significant 
mode coupling and cross correlations between the multi- 
pole coefficients. 

These two effects have to be taken into account in the 
analysis and they turn the covariance matrix into a dense 
complicated matrix. Unfortunately, this means that in- 
version becomes intractable for high resolution experi- 
ments. These analyse the CMB up to multipoles with 
(„ of a few times I0 3 implying that for exact ML meth- 
ods the inversion of a huge 10 x 10 matrix is required 
which is numerically not feasible at least by direct calcu- 
lation. 

To circumvent this problem many approximate meth- 
ods have been developed. One of the most important 
approaches is the so-called pseudo-C; (PCX) approach 
that is discussed in section III Dl While PCL estimation 
can give rise to good estimators, it is in general not as 
accurate as the full ML evaluation. The goal of the work 
discussed in this report is to develop and ultimately im- 
plement alternative estimators that improve on PCL es- 
timation and are much closer in accuracy to ML esti- 
mators. The big advantage of PCL estimation is that 
the data is compressed into a very limited number of ob- 
servables which reduces the computational effort tremen- 
dously. Unfortunately, the compression is not lossless as 
soon as statistical anisotropies are introduced. We are 
going to show how one can improve significantly on PCL 
estimation by introducing a small number of additional 
observables that recover the off-diagonal information not 
picked up in a PCL analysis. 

The approach is inspired by the basic idea discussed in 
[l[ . Introducing a very limited number of modes or ob- 
servables that are quadratic in the multipole coefficients, 
the authors suggest basing the CMB analysis on these 
observables rather than the full set of multipole coeffi- 
cients. This is of course exactly in the spirit of PCL 
estimation. However the authors do not restrict them- 
selves to PCL observables but rather keep the freedom to 
choose observables or basis functions in multipole space 
(giving rise to observables by considering inner products 
with the data) adapted to the problem at hand. This 



is motivated by previous work on higher order correla- 
tion functions where these modal methods have led to 
significant numerical simplification^] 0-0] • 

Our purpose here will be to investigate the structure 
of the masked covariance matrix and hence to iden- 
tify the additional modes beyond the PCL observables 
which capture the 'off-diagonal' information required for 
a near optimal analysis. We will focus on the statistical 
anisotropies introduced by the mask and we will not be 
considering instrumental noise throughout most of the 
discussion. 

The paper is organised as follows. In section |TT] we 
start by discussing the ML approach to cosmological pa- 
rameter estimation. First we review the simplified case 
of complete sky coverage to develop some basic ideas and 
then focus on the case of incomplete sky coverage. We 
briefly study PCL based CMB analysis and then proceed 
to discuss our approach to the problem. In particular in 
section Mil we introduce an approximation to the exact 
ML evaluation based on an arbitrary set of observables. 
This approximated estimator is extremely useful as it re- 
duces to PCL estimation if one chooses PCL observables 
but for appropriate sets of observables, in particular for 
a complete set of observables, it is exactly equivalent to 
ML estimation. In this sense it interpolates between PCL 
and ML estimation and one can systematically improve 
on PCL estimation by including new observables. We go 
on to construct a set of observables that in combination 
with PCL observables leads to significant improvements 
in the performance of the estimator in the presence of 
a mask compared to standard PCL estimation. Section 
IIVI contains a systematic analysis that is intended to de- 
velop a deeper understanding of the structure of masked 
covariance matrices and why the new observables im- 
prove PCL estimation. Readers that are not interested 
in these technical details can skip this section and di- 
rectly move on to section [V] where the new observables 
and a summary of our method is presented. Results of 
numerical investigations validating the performance of 
the estimator can be found in section IVII We conclude 
by summarising progress with this methodology to date 
and providing an outlook for what remains to be done in 
future work. 



Conventions 



In the context of higher order correlation functions the main 
goal was to arrive at separable approximations to theoretical 
polyspectra by expanding in a small set of separable modes to 
reduce the computational complexity of the problem. While sep- 
arability of the basis functions will speed up calculations in this 
paper too we will focus on the basic claim that there exist ap- 
propriately chosen sets of basis functions that capture nearly all 
the information but are still small enough to ensure tractability 
of the problem. 
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To ease notation, pairs of indices (i,,mj) will usually be 
denoted as (l^mj) = tj. For example an isotropic co- 
variance matrix of multipole coefficients will be written 

as 

( a hmi0.l 2 m 2 ) — Ci 1 Si 1 i 2 S mim2 = Cj^^fe = (a^diz) (3) 

Unless stated otherwise, repeated indices are summed 
over. We use a real set of spherical harmonics Y t R = Y t f n 
throughout to keep the multipole coefficient of the tem- 
perature fluctuations manifestly real. 



II. COSMOLOGICAL PARAMETER 
ESTIMATION 

As a starting point for our discussion of CMB analysis 
we assume a data vector a consisting of the multipole 
coefficients a\ of the measured (pixelised) map of tem- 
perature fluctuation AT(nj), that is, with a discrete set 
of pixel directions i = 1, 2, ... A^ pix . These are defined by 

at = Yi R {ni)AT(ni)U(ni) (4) 

hi 

up to a certain Z max depending on the resolution of the 
experiment. Here, U(h) is the mask function that van- 
ishes in the masked regions and equals unity elsewhere. 
The mask function accounts for incomplete sky coverage. 
This is necessary either because the experiment does not 
observe the entire celestial sphere, which is usually the 
case for earthbound missions, or because a fraction of 
the sky is obscured by foreground contamination as dis- 
cussed in the introduction. In general, one could also de- 
fine pseudo multipole coefficients obtained by replacing 
the mask function U (h) by a weight function W(n) that 
vanishes in contaminated regions but is not restricted to 
equal I elsewhere. In particular one can choose W(h) 
that rise smoothly from to I so that ringing in multi- 
pole space due to discontinuities in pixel space is reduced. 
The accuracy of certain analysis schemes, in particular 
PCL estimation, can benefit from thi^|. In this work we 



2 As mentioned earlier, PCL estimation suffers from sub- 
optimalities on an incomplete sky due to mode coupling that 
renders the covariance matrix off-diagonal. Using a mask U(h) 
that has discontinuities in pixel space leads to ringing in mul- 
tipole space and can enhance mode coupling. It is possible to 
reduce ringing to some extent by using continuous weight func- 
tions W(n) in the hope of reducing the variance of the resulting 
estimators. While PCL estimation can benefit from this apodis- 
ing procedure 0, [f| it seems somewhat arbitrary and ad-hoc and 
it is not clear to what extent optimality can be restored simply 
by smoothing the mask. Hence we avoid apodisation of the mask 
in this work and attempt to restore optimality more systemat- 
ically. It should be noted that the estimator we suggest can in 
principle also be applied to data masked with an arbitrary weight 
function so apodisation procedures can be used in combination 
with the improved PCL estimator discussed here. 



specify a mask function U(h) and attempt to construct 
efficient estimators based on the pseudo-multipole coeffi- 
cients (j4}. While some of the arguments we will use only 
strictly apply for such a pure mask function the major 
conclusion and results should also be applicable for more 
general weight functions W(n) (e.g. apodised masks). 

The definition for the ay given in (f?]) above obviously 
assumes that the raw data from the satellite has already 
been processed and compressed into a pixelised map of 
temperature fluctuations AT(fii). How to compress the 
time-ordered data into a map has been extensively dis- 
cussed elsewhere and is not our concern tier^H- 

The CMB is to a very good approximation Gaussian 
so the probability density function (PDF) of a is entirely 
determined by the covariance matrix of multipole coef- 
ficients C i.e. given by V (a\C). The central problem 
in CMB analysis is to estimate a set of parameters {e Q } 
that parametrise the covariance matrix C = C({e a }). 

A standard approach to this problem is using Maxi- 
mum Likelhood (ML) estimation. In the framework of 
ML estimation we arrive at estimates e a by maximis- 
ing the likelihood function given in terms of the PDF via 
C {{e a }\a) = (V {a\C({e a })) with respect to the e a . The 
estimates are then simply given by e a — e^ IL where e^ L 
is the set of parameters that maximises the likelihood. 
Many ML methods have been discussed in the literature 
and applied to datasets [1, Il0l - [l3l | . To review the basic 
ideas involved it is sensible to start by studying the case 
of complete sky coverage, i.e. we set the mask function 
U(n) = I everywhere and the multipole coefficients de- 
fined above are the true multipoles of the temperature 
fluctuations. 



A. CMB analysis on the complete sky 

In the case of complete sky coverage the covariance 
structure of the multipole coefficients ai is very sim- 
ple. Since we expect the fluctuations to be statistically 
isotropic we need 

( c )hh = ( a ti a i 2 ) = C h 5 hl2 (5) 

where C; is the power spectrum of the CMB. It is 
straightforward to see that the Gaussian PDF for the 
temperature fluctuations 

V(a\C) = ±ex P (-±(a) T C- 1 a^j (6) 
iV = y / (2 7 r) rank ( c ')det(C) (7) 



3 See for example 7, 8j for early work and 9] and references within 
for more recent developments. 



4 



can be written in terms of a number of Ci estimators Ci 



V(a\{C l }) = ±expl-±J2( 2l + 1 ' 



Ci 



(2Z+1) 



N= /(27r)(W+i)=JJC ( 



where we have substituted ([5]) and we define 



2/ 



+ i ^ 



(8) 



(9) 



(10) 



ft is easy to show that the Ci are ML estimators for the 
C;. Furthermore they are unbiased (Cj) = C; and are ef- 
ficient estimators as they achieve the Cramer-Rao lower 
bouncjfl on the variance of an unbiased estimator for all 
values of the parameters C;. In this sense we cannot de- 
rive any better estimates for the C;. 

Cosmological parameters e Q that determine the power 
spectrum C/({e a }) can be determined by finding their 
ML values. This can be done using Newton-Raphson it- 
erations giving rise to a Newton-Raphson ML (NRML) 
estimator. A detailed derivation in the most general case, 
i.e. not assuming complete sky coverage, can be found in 
[T^ | . It is also reviewed in the appendix [B] It should 
be pointed out that the procedure is not exactly the 
Newton-Raphson method but rather an approximation 
where each step depends on the data in a quadratic fash- 
ion. In fact the approximation should be extremely good 
for the typical sample size of a high resolution CMB ex- 
periment. Rather than worrying about this technicality 
here we will simply ignore it and refer to the quadratic 
approximation as the NRML estimator (consistent with 
14]) and discuss this issue in the appendix. To find the 
ML value we start with a fiducial guess e Q giving rise to 



4 The Cramer-Rao bound in its simplest form is a lower bound 
on the variance of an unbiased estimator. For an estimator e 
estimating a single parameter e without a bias, i.e. one that 
satisfies (e) = e, it states that we need 

1 



Var(e) = <n-<eT > 



F(e) 



(11) 



where F(e) is the Fisher information defined by 

*W-<(2fc2)"> C* 

The generalisation to an unbiased multi-parameter estimator e a 
estimating a set of parameters e a reads 

Cov (e a ,e>) = {e a ep) - {e a )(e ) > F~p (13) 

where the Fisher information matrix is given by 



d log V d log V 

fa/3 = {— ^ a ' 

de Q den 



(14) 



and the matrix equation A > B for two square matrices A and 
B means that the matrix A — B is positive definite. 



a fiducial power spectrum Ci(e) and calculate Se a via 
1 __ x ^ dCi (21 + 1) 



Se <x = n^J E 



dep Cf 



Ci-CA (15) 



where 



1 ^ dCi (21 + 1) dCi 



2 ^ de a Cf dep 



(16) 



This will converge to the ML values of the e Q given that 
one does not get stuck in a local minimum. 

Assessing the accuracy of the parameter estimation is 
somewhat trickier than in the Ci case. It seems very 
hard to arrive at an analytical expression for the expec- 
tation values and variances as we only have an iterative 
procedure at hand to arrive at the ML values. To obtain 
an approximate expression for the covariance matrix let 
us adopt the following point of view. We assume that 
the sample size is large enough so that the bias of the 
ML estimates and e^f L — e a generally are small and the 
likelihood sufficiently close to Gaussian so that to a good 
accuracy we obtain the ML estimates from a single iter- 
ation 



5e r 



dC_i (21 + 1) 

der- 



2Cf 



(Ci ~ Ci) (17) 



away from the true parameter values e a . In this case we 
arrive at an estimate for the variance of our estimator 
given by 



a/3 



(18) 



Of course in practice we do not know the true values of 
the parameters. But since we assume the ML estimates 
to be generally very close to the true values we might as 
well just evaluate the above expression using the ML val- 
ues of the parameters for our set of data. In practice this 
means using the covariance matrix of the last iteration 
step as an estimate for the covariance matrix of our ML 
estimates which is also the route taken in [l 21 ] . 

Note that this way of assessing errors somewhat as- 
sumes that the infinite sample size limit applies. As the 
ML estimator is expected to be asymptotically efficient, 
i.e. it achieves the Cramer-Rao bound for all values of the 
parameters in this limit, this method of assessing errors 
should give rise to the conclusion that ML estimation is 
optimal. Indeed it can be shown that we have T a p = F a p 
where F a p is the Fisher matrix so that, in this approxi- 
mation, ML estimation gets assigned the lowest possible 
error bars as we expected. 

Potential issues with this way of assessing errors could 
arise if the parameters we want to estimate solely influ- 
ence the power spectrum at low I. In this situation it is 
questionable to what extent the sample size can be con- 
sidered large and the likelihood might exhibit significant 
deviations from Gaussianity. This is because there is only 
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a very limited amount of information on the parameters 
in the small I modes of a particular CMB realisation. 
Generally, physical parameters will influence the power 
spectrum at all scales and hence the information content 
of a particular CMB realisation should become extremely 
large as one analyses the fluctuations to small scales. 



B. CMB analysis on an incomplete sky 

The most severe idealisation in the analysis above was 
the assumption that we can recover the true multipole 
coefficients of the CMB from experiments. Due to fore- 
ground contamination real CMB experiments can only 
measure the temperature fluctuations AT(nj) on part of 
the sky. Hence instead of the true multipole coefficients 
the data can only be represented by a vector of masked 
multipole coefficients a coefficients given by 



E 



y l fl (n i )AT(n l )i7(n l 



(19) 



up to a certain l max depending on the resolution of the 
experiment. Recall that U (n) is the mask function that 
vanishes in the masked regions and equals unity else- 
where. 

Even though we are now dealing with masked multi- 
pole coefficients the statistics clearly remain Gaussian, 
so it is still possible to express the PDF in the form of 
the full sky version ([6]) but with an appropriately mod- 
ified covariance matrix. However, complications arise 
because the new covariance matrix C becomes singu- 
lar (observations of different multipole coefficients are 
no longer statistically independent) and the simple in- 
version in ([5]) is not a well-defined operation. Instead, 
we have to identify what remains of the column space of 
C, that is, the image Im(C), and then essentially in- 
vert C restricted to this subspace. In other words, we 
need to find the (unique) pseudoinverse C + that satis- 
fies CC+C = C, C+CC+ = c+ (cc+) T = c c + 

and (C + C) T = C + C. Substituting C + for the inverse 
of the full sky covariance C -1 in ^ will then yield the 
masked PDF (a more detailed discussion can be found in 
Appendix . Let us define the following numbers, 



N := rank (C) = dim (Im(C)) , and 
K := nullity (C) = dim (/Cer(C)) 



(20) 



with N + JC = (Z max + I) 2 where /Cer(C) is the kernel or 
null space of C. If we introduce N vectors {V„} that 
form an orthonormal basis of Tm(C), then any masked 
realisation of the CMB can be expanded as 



a = 2J A n V n withA„ = V n ■ a . 



(21) 



the vector A is then 
V(A\C) 
with normalization factor 



ex P \-\ aTC+a 



N= y / (27r) A ^det + (C) 



(22) 



(23) 



Here det + (C) is the pseudodeterminant of C given by 
the product of all J\f non-zero eigenvalues. 

We would like to estimate cosmological parameters 
from this PDF. The construction of the iterative NRML 
estimator works just as in the full-sky case considered 
above with the result (compare [l2j]) 



5e a 



l T -! d(C) bl2 
2 @ dep 



{C+) llVi {C + ) l2 y 2 (a^-iC)^) 

(24) 



where 



FaP — - 



ld(C) hl 



2 de a 



iC + ) hVi (C + ) h 



(25) 



Again we refer to the appendix [B] for the details of the 
derivation. Within the approximations discussed in the 
last section the ML estimator is efficient as it has a vari- 
ance approximately given by and one can show that 
the Fisher matrix is given by F a p = T a p. 

Unfortunately, evaluating the NRML is not feasible al- 
ready for moderately large / max of order 10 2 as it requires 
the inversion of the full multipole covariance matrix. In 
the case of the full sky analysis the covariance matrix was 
simply diagonal but the coupling of the spherical harmon- 
ics on an incomplete sky turns the covariance matrix into 
a complicated dense matrix as is discussed in the next 
section and inversion is no longer feasible. 



C. Masked covariance matrices 

When we introduce a mask in pixel space we simply 
set the signal on the contaminated pixels to zero. We can 
think of this operation as a projection onto the masked 
sky. In pixel space the operation is extremely simple. 
However, in multipole space the corresponding projec- 
tion operator P relating configurations on the complete 
and the masked sky is given by the coupling matrix of 
spherical harmonics on the masked sky 



(P) [ll2 = / tfnU(n)Y£{n)Yg{n) 



(26) 



Note that, working for now up to arbitrarily high we 
can make use of the completeness of the spherical har- 
monics and we have exactly 



for an inner product summed over all I, m. The PDF for 



P 2 = P and P T = P, (27) 
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Figure 1. Schematic of the action of the projection operator V 
induced by masking the sky, see eqns (|26p and (|30[1 . The full 
sky covariance matrix Co is projected to C for the masked 
sky which lies in the image of V (represented by a plane). The 
subset of isotropic covariance matrices (blue line) is mapped 
to a subset (red line) lying in Zm('P). The kernel /Cer('P) 
includes all matrices C' which are P-equivalent to zero. 



so that P is indeed an orthogonal projection operator. 

Starting with an isotropic theory on the complete sky 
described by a power spectrum C/ the masked covariance 
(neglecting instrument noise for now) can be calculated 
to be 

{C) hh = J d 2 nid 2 n 2 U(fn)U (n 2 )Y,f {ni)Yg (n 2 ) 

x (AT (hi) AT (h 2 )) (28) 

S v ' 

= (P) hVi C Vi (P) Vil2 (29) 

In principle, the sum over l[ in eqn (|28|) extends to in- 
finite I' unless we are dealing with band limited power 
spectra. Realistic CMB power spectra decay quickly and 
so extending the sum to higher I' will have little effect 
on lower I correlations. If we are only interested in the 
covariance matrix up to a certain (large) Z max because 
of beam resolution limits or a poor signal-to-noise ratio, 
then it is reasonable to cut off the sum without introduc- 
ing significant errors. 

If we include all multipole coefficients in the analysis, 
notably the monopole and dipole that are not observ- 
able in practical and work up to high I, then we can un- 
derstand the masked covariance matrix as the isotropic 
diagonal covariance sandwiched between two orthogonal 



projection operators P. This operation 

V(C) = PCP, (30) 

inherits its properties from P and is itself a projection^ V 
is since V (V(C)) = V(C). The action of the projection 
V is illustrated schematically in fig. [TJ The subset of 
isotropic covariance matrices Co (represented by a line 
in fig. [1]) is mapped to a set of anisotropic covariance 
matrices (another line) lying in Tm(V). This picture is 
useful for the study of estimators on the incomplete sky 
as in quadratic estimators any matrices weighing the data 
are generically sandwiched between projected quantities 
and hence only their projected parts are important. To 
formalise this slightly further we introduce the notion 
of a P-equivalence relation between matrices induced by 
the projection V. Given a projection operator P, two 
matrices A and B are said to be P-equivalent if they are 
the same under the projection V, that is, 

A £ B PAP T = PBP T (32) 

Clearly, the simplest example is Co = (C'; 1 (5[ 1 [ 2 ) ~ C, 
that is, the isotropic covariance Co is P-equivalent to 
the corresponding masked covariance matrix C. 

Summing up, we see that the masked covariance is sig- 
nificantly different from a simple diagonal matrix due to 
the non-trivial coupling of spherical harmonics on the 
masked sky and cannot be inverted analytically. For 
high resolution experiments the dimension of the covari- 
ance matrix is of order 10 7 and numerical inversion is 
not feasible. For this reason, there has been much effort 
over the recent decades to construct approximate meth- 
ods that allow a fast estimation of parameters while re- 
taining the accuracy offered by a ML procedure (see for 
example 0, [TM1])- The most prominent and widely- 
used method is pseudo-C; (PCL) estimation. 



D. Pseudo-C; estimation 

Instead of evaluating the full likelihood, PCL estima- 
tion relies on the introduction of quantities Ci that are 
obtained from the a; m just as the Ci in the case of com- 



For consistency we never remove the monopole and dipole in 
this work from simulated CMB maps but simply assume a value 
for the power spectrum coefficient Co and C±. Removing the 
monopole and dipole, as it has become standard, should not 
pose significant problems for the methods developed in this work. 
However, the approach we choose is simpler to understand when 
the monopole and dipole are included as the interpretation of P 
as a simple orthogonal projection operator is available. 



In fact, if we choose the Frobenius inner product as an inner 
product A : B on the space of matrices, 

A . B = Tr[AB T ] , (31) 

which is a natural choice in the context of our discussion later on, 
the projection V is an orthogonal projection just like P. This is 
easy to see as V is clearly self-adjoint with respect to this inner 
product, V(A) : B = A. : V(B), and a self-adjoint projection is 
orthogonal. Hence it is sensible to draw image and kernel of V 
at a right angle in figure [T] 
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plete sky coverage 



Irn 



Using the form of the covariance matrix on a masked sky 
it is easy to show that they are related to the true power 
spectrum C\ via 



where 



(cy = (M) hh c h 



It has been argued by many authors that PCL estimation 
is very close to optimal at high / and that there is no real 
^33^ need for a significantly more costly ML estimation. It has 
been suggested that the known suboptimality of PCL es- 
timation at low I can be circumvented by combining high 
/ PCL estimates with low I ML estimates obtained from 
smoothed maps [ELfl^l]. The emerging hybrid estimator is 
claimed to recover nearly all information on cosmological 
parameters present in a, a proposal we will address later 

(34) 

on when we study the properties of the masked covari- 
ance matrix in a realistic context. 



(M) hh = (2l 2 + l)J2 



2*3 + 1 
47T 



U la 



h h h 




(35) 



with Ui given in terms of the multipole coeffients ui m of 
the mask function U (n) 



21 



+ i ^ 



lm 



(36) 



Given that M is invertiblc, which is usually the case 
given the fraction / s w covered by the experiment is not 
too small (compare [14] and references within), we can 
obtain unbiased PCL estimates Cf 



& = (m-^^a 



(37) 



In contrast to the full-sky case these estimators are not 
ML estimators for the C\ . Furthermore, as is well known, 
they are in general not optimal. We will discuss this point 
in more detail below. 



A sensible way of estimating cosmological parameters 
e a using PCL methods would be, for example, to start 
with a fiducial guess, calculating the covariance matrix 
(ACf ACf 2 ) for the corresponding power spectrum and 
then minimising the \ 2 

X 2 = (CI C h (e Q )) ((ACW)- 1 ^ i (CI - C h (O) 

(38) 

We can calculate the next step via a Newton-Raphson 
iteration for finding the zero of dx 2 given by 



Sea = ((A^ACP)- 1 )^ (Cf 2 - C l2 ) (39) 

where 

na, / . . .\ Fir, 

(40) 



In the framework of approximations made in the sections 
above we can use the covariance matrix of the last itera- 
tion as an estimate for the covariance matrix, i.e. 



III. GENERALISED APPROACH TO CMB 
PARAMETER ESTIMATION 



As discussed above a brute force numerical evaluation 
of the full NRML estimator 

is numerically not feasible as it involves the pseudoin- 
verse of the huge covariance matrix. Hence it is desirable 
to find approximations to the estimator that do not re- 
quire the inversion of the full covariance matrix but still 
perform nearly as well. Here we introduce a class of es- 
timators that can be thought of as approximations to 
the NRML estimator. As special cases this class includes 
the NRML estimator itself and also the PCL estimator 
discussed in the section above. 

The basic idea is to introduce a set of real basis func- 
tions or modes Q n (h,h)- The efficacy of the method 
depends on the overall number n max of modes required 
being modest n max <C (Imax + l) 2 - We can then define a 
new (iterative) estimator with (compare [H) 



1 da T x dot 

1 a/3 = 77 ~q S "5 

2 oe a oep 



(43) 
(44) 



where the modal coefficients are given by the summations 
(or inner products) 

(a)n = Qn(h,h)(C) hl2 (45) 

(/3)„ = Q„(M 2 Ka, 2 (46) 

(Onin, = Qn{h, h) (CVi(C)t^ QnOi, Q (47) 

The new modal estimator has a number of desirable prop- 
erties. Given that the coefficients (3 and ot can be easily 
calculated this estimator has significant advantages from 
a computational point of view. The modal covariance 
matrix £ can then be obtained to arbitrary precision from 
Monte Carlo sampling via 



(41) 



1 



(((3 - a)(f3 - af) 



(48) 
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and since we are typically dealing with a limited number 
of basis functions n max the inversion £ _1 becomes a triv- 
ial task. The problem reduces to finding a set of basis 
functions or modes Q n (h, h) with the property that the 
estimator performs close to optimal. Before proceeding 
it is useful to discuss a number of special cases. 



A. NRML estimator and optimal approximations 

Obviously, if we introduce a complete set of basis func- 
tions Q n the estimator is mathematically identical to the 
full NRML estimator. However, it can also be equivalent 
to the full estimator even if the modal set is not com- 
plete, provided the modal expansion is equivalent to the 
estimator product (|42j) up to the projection P. This can 
be expressed more formally using the _P-equivalence rela- 
tion defined earlier in (|32l) . that is, the modal estimator 
will be optimal if 



d{C) hhiri 

de a 



(C + ) hVl (C + ) l2l , £ £<£Q w (ti,&). (49) 



B. Pseudo-C; as a modal estimator 

The second case of interest is the choice of PCL basis 
functions 



2Zi + l 



for which we then simply have 

(a) h = (M) hh C h 
(f3) h = (M) hh Cf 2 

(0u 2 = l(M) llVi (M) l2l ,{ACf,ACf,) 



(50) 



(51) 
(52) 

(53) 



Plugging this into the expressions for the approximated 
NRML modal estimator P3")l shows that we arrive ex- 
actly at the iterative PCL estimator introduced previ- 
ously (15<?1) . 

In the next section we will endeavour to answer the key 
question about when and why PCL estimation is equiv- 
alent, or at least close in accuracy, to the full NRML 
estimator. The discussion motivates our choice of basis 
functions in section|V]and aims at understanding why the 
extended set of basis functions gives rise to an estimator 
more accurate than simple PCL estimation. 



The reason for this is because we then have 

= = E7nQn(li,li)(C) Ill3 (C) IiU Q m (l3,l4) 



+, d(C),,,, 



hence 



dtp 



l d(C) hh . + 
^on /I'rM^ki Ida ida 



and also for the full estimator 

<*'.. -J": ! — —(C + ) h i i (C + ) l2l , 2 (a Vl a l>2 - (C)^) 



1 ^(C),^ 

2 ^ dep 



We see that it is not at all necessary to introduce a com- 
plete set of basis functions. In principle, all that is re- 
quired is a set of basis functions satisfying and the 
approximated estimator will be identical to the original 
NRML estimator. Of course we do not know how to cal- 
culate C + which is why we are exploring these alternative 
methodologies in the first place. 



IV. PROPERTIES OF MASKED COVARIANCE 
MATRICES 

A. Simplest case with constant power spectrum 

Although realistic power spectra are not constant, this 
case is of some interest because it provides insight into 
more realistic situations of practical importance. As we 
shall see it is the one special case in which the pseudo-C; 
estimator proves to be sufficient for achieving optimality. 

For an approximately constant power spectrum C ~ 
Co the mulitpole covariance matrix is given by C = CqP 
using (O and (|28|) . Since P is an orthogonal projection 
operator it is its own pseudoinverse, i.e. P + = P so that 
C + = 1/CqP. This implies the P-equi valence, 



d(C) hh + + 9{C\ V2 

jtili (^ )[ 2 [> 2 cx 



de a 



2?i + 



1 ' 
. (54) 

so by the criterion above PCL estimation is manifestly 
equivalent to the NRML method. This means that if the 
power spectrum in the data is close to constant and the 
purpose of the analysis is to estimate slight deviations 
from the constant shape there is indeed no gain from 
applying the full NRML procedure. The error bars on 
any parameters will be just as small using the far less 
costly PCL analysis. 

Now a realistic power spectrum, in particular the con- 
cordance ACDM model, is not at all constant. The I de- 
pendance at low to moderately high multipoles is a l/l 2 
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drop-off modulated by acoustic peaks. To understand 
why this complicates the situation we need to study the 
general structure of covariance matrices in more detail. 
It will turn out to be useful to study the normalised co- 
variance matrix C defined by 



(C)lll2 
1 1 

c? c? 

tl 12 



(55) 



Understanding this object is particularly useful because 
we can rewrite 



(C' iQ )[ 1 [ 2 (C + )i 1 [< i (C + ) l2l j ! 

{c ^f- (c*(c+) Wi c|) (cl{c + ) hV ci 



C 2 Cf 

(C,ct)til 2 ir> + \ ir + \ 
1 — —(L- jlilil^ Asia 

C 2 c? 

h <2 



C?, C, 2 , 



' C 2 C 2 

1 l 



where we used a result proven at the end of Appendix 
[Cl to pull out the normalisation factors inside the pseu- 
doinversion. We see that merely pseudoinversion of C is 
required. 



B. General covariance matrices from oblique 
projections 

In this section we study the structure of masked co- 
variance matrice^l which previously we showed took a 
simple projected form (1281) . As argued above without 
loss of generality, we will represent these covariance ma- 
trices in a normalised form C defined by: 



(C) hl 



\{P) hVi C Vi {P) Vil2 \ (56) 



c? 

ll 



The starting point for this general discussion is the in- 
sight that we can write 



c = PcPb 

where we normalise the projection operator P as 

1 



(P 



C)hh 



(57) 



(58) 



Like P defined in ([26|l . the matrix Pc is also a projection 
operator satisfying (P c ) 2 = P C , (Pc) 2 = p c- 



How- 



ever, Pc is not an orthogonal projection since Pq ^ Pc 
- instead it is an oblique projection. To analyse the struc- 
ture of Pc we consider its four matrix subspaces: the 



H 




/ 

Q 








~i 


Im(P c ) 







Figure 2. Images and kernels of the projection operators in 
two dimensions for a P of rank 1. The successive projections 
leading to the single nonzero eigenvalue are shown as well. 



image (or column space) Xm(Pc), the kernel (or null 
space) JCer(Pc), the row space Ira(Pp) and the null 
space JCer(Pc) for the transpose. Now for any linear 
mapping, we must have the following orthogonal com- 
plements: 



lm(Pc) JL ICer(Pl) 
Xm{P T c ) _L Ker{P c ) 



(59) 
(60) 



Further, the covariance matrix structure (|57[) implies we 
also have 



Xm(C) = Xm{P c Pc) = Xm{P c ) , 
/Cer(C) = JCer{P c Pc) = ^r{P T c ) ■ 



(61) 
(62) 



Recalling (|20|) . the eigenstructure of C will include 
M non-zero eigenvalues for the eigenvectors spanning 
Xm(Pc) and JC zero eigenvalues for those spanning 
/Cer(P^J). Now it can be shown that the non-zero eigen- 
values must all have A„ > 1 (n = 0, 1, 2, A/"). This 
is because an eigenvector v £ Xm{Pc) with non-zero 
eigenvalue A must satisfjH ||Ppu|| > ||u||. Similarly 
any vector u 6 Im(Pp), in particular u — Pj^v, must 
satisfy ||Pc?ii|| > ||u||. Hence if v is a (unit normalised) 
eigenvector of C with non-zero eigenvalue we need 



A,. 



\Cv\ 



\P c P T c v\\ > \\Plv\\ > \\v\\ > 1 



(63) 



Thus we expect the spectrum of C to consist of K, zero 
eigenvalues and M eigenvalues that are equal to or larger 



In fact one could also study more general matrices of the type 
A~2PA.PA.~2 for invertible positive definite A and any or- 
thogonal projection operator P but for isotropic models the more 
restricted form (with A diagonal) provided here is sufficient. 



We can define PqV — v:=k and we must have fc £ /Cer(PjE) as 
is a projection. This gives ||jP£w| 2 = j|fc + u|| 2 = v 2 + k 2 + 
2v ■ k but since Xm(Pc) -L 'Cer(P^) and v G Xm(Pc) we have 
v k = 0. Thus ||Pg«|| 2 = v 2 + k 2 >v 2 or \\P%v\\ > \\v\\. 
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than unity. 

If we restrict attention for a moment to the two- 
dimensional case with P of rank one, we can visualise 
the situation in a simple two-dimensional context as is 
shown in figure [2] The kernel of C is just given by 
K.er(P c ) and the eigenspace with the single non-zero 
eigenvalue A is Im(Pc)- Note that the two eigenspaces 
are orthogonal as is required since C is symmetric. The 
point to emphasize here is that A is exactly unity if 
Xm(Pc) -L ICer(Pc) and can become arbitrarily large 
when the two are aligned. As discussed above A can 
never be smaller than one. 

Despite the fact that we were discussing the simple 
two-dimensional case this will still be true in higher di- 
mensions. C will have a very large eigenvalue when there 
are directions in Im(Pc) that are almost aligned with 
directions in JCer(Pc)- Directions in Im(Pc) that are 
still close to orthogonal to K,er(Pc) will still correspond 
to eigenvalues close to unity. 

This insight is interesting as for masked covariance ma- 
trices with power spectra exhibiting strong changes (as 
it is the case for realistic power spectra) there are gener- 
ically aligned directions associated with the mask, as we 
will see below. These directions will come with large 
eigevalues that cause the covariance matrix to be differ- 
ent from a simple orthogonal projection operator, that 
is, if lm(Pc) and JCer{Pc) were orthogonal (which they 
are not in general). 

Im(Pc) is spanned by the linearly dependent set of 
unnormalised vectors V[ Q defined by 

Pi i ( Y i R U), 
K)«=^-¥ = ij M i ) (64) 
Cf Cf 

while JCer(Pc) is spanned by the linearly dependent set 
of unnormalised vectors k[ a given by 

< M = ^J±Z^ll. (65) 

Cf Cf 

Looking at these vectors it is evident that for a decaying 
power spectrum there must be aligned directions. For a 
given [ each pair vi and k[ are aligned, especially pairs 
with low I. This is due to the decay of the power spectrum 
that suppresses the difference between the two vectors in 
the [ component and upweights their high I components 
that are identical. 



C. Enhanced eigenvectors from l/l 2 power spectra 

As the CMB power spectrum initially decays approx- 
imately like l/l 2 with some modulations this is an im- 
portant case to understand. Eventually we will have to 
take into account that due to beam smearing and Silk 
damping there will be a much faster exponential decay 
at high multipoles. This will cause some complications 



that will be discussed below. 

The mask leads to a coupling of different multipoles. To 
arrive at a simple model for the coupling we can simply 
assume that we have as a rough order of magnitude es- 
timate | (Y^U) I ~ l/{\l + 1)- This in turn means 

1 /2 

that we have for the vectors normalised by C l oc 1 fl 

IK)tlH(fcfo)d~OT)-z| + i) (66) 

In other words, for moderately large Iq these vectors are 
strongly peaked at Iq and have most of their support in a 
vicinity of Iq. On the other hand, for low Iq their magni- 
tude is approximately constant and they have significant 
support for all I. To calculate inner products involving 
V{ or fc[ with large Iq we can restrict the sum to a 
vicinity of Iq. But for large Iq the power spectrum does 
not change much for a l/l 2 decay and is locally almost 
constant. This means that the v\ with large Iq remain 
almost orthogonal to the kernel spanned by the fe[ be- 
cause any (unnormalised) (Y^U) is exactly orthogonal 

to all the 5 to[ - (y t fC/) r 

Only the low Iq V[ get aligned strongly with the cor- 
responding ki and potentially with each other. Thus 
any enhanced direction v must be essentially a linear 
combination of the V\ with low I, i.e. can be written as 

« ,« ( A ) io(«io)i = 1 (67) 

Cf 

for some vector A with components that are only signif- 
icantly different from for low I. 

Further insight can be gained from considering the sim- 
plified situation of a pure sky cut that is also symmetric 
about the equator O = ir/4 instead of the full mask with 
its asymmetric galactic cut and pointlike features spread 
over the sky. In this case Vi with different m are mutu- 
ally orthogonal and the projection operator Pc as well 
as the covariance matrix is block diagonal in m. Fur- 
thermore, since we assume the sky cut to be symmetric 
about the equator, we conclude that V\ with even and 
odd I must be mutually orthogonal due to parity. Simi- 
larly Pc and C must be block diagonal in parity. This 
obviously means that we can choose a basis of eigenvec- 
tors of C such that every eigenvector is entirely contained 
in one of the subspaces with a given m and p denoted by 
QJ™ (p = + for even parity and p = — for odd parity). 
The point is that the set of directions v parametrised 
by A contains essentially only one vector in each of the 
QJ™. To see this, note that for low I dominated A the 
high I components of v are governed by the discontinu- 
ities in U(n) (Y^(n) (A) Iq ). As is depicted in figure[3]a) 
we can always split this expression into a sum of a part 
that looks like the mask itself with its step discontinu- 
ities and the associated ringing in multipole space and a 
part that is continuous and does not contribute signifi- 
cantly to the high I behaviour. But this means that for 
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a given m, p all the directions t> c parametrised by A in 
the equation above are very aligned. Their low I differ- 
ences arising from the continuous part in figure |3] a) are 
suppressed by the scaling of the power spectrum. Hence 
there can only be one large eigenvector in each 23™ . This 
eigenvector will approximately look like either v\ m \ m if 
to even and p = + or to odd and p — — or v (| m |+i) m 
otherwise. Of course kernel and image are are objects 



e = tt/4 



+ 



b) 




Figure 3. a) For a pure sky cut symmetric about the equator 
any of the directions i>i for moderately high lo is essentially 
the same as i>| m „| mo or i>(| m[) |+i) mo as the high I behaviour 
is governed by the discontinuities in U(n)Yj^(n). The figure 
shows schematically how these discontinuities (in the Q di- 
rection) are essentially the same as those of the sky cut. b) 
A schematic picture of the image 3c and kernel Rc of Pc 
restricted to a given 23™, i.e. 3 C n 23™ and & c n 53™. All 
directions in the kernel except for k c are suppressed in this 
picture to enable us to represent the situation in three dimen- 
sions. The directions in the kernel orthogonal to k c should 
be thought of as being orthogonal to the image as well. The 
important feature is that any direction in 3c H 23™ orthogo- 
nal to v c is also orthogonal to the kernel and hence cannot 
correspond to a large eigenvalue. 



of large dimensions so it is somewhat difficult to get a 
geometrical intuition. However, we can draw the three 
dimensional schematical picture in figure [3] b) suppress- 
ing most directions. It shows geometrically how for given 
77i and p and for a pure sky cut there is a single enhanced 
direction v G 03™ aligned with JCer(Pc) and all vec- 
tors in lm(Pc) H 23™ orthogonal to this vector are also 
approximately orthogonal to ICer(Pc) and hence cannot 
be eigenvectors with large eigenvalues. 

As the covariance matrix is block diagonal for a pure 



sky cut it is numerically feasible to calculate its eigensys- 
tcm. To confirm the ideas above numerically, we picked 
the to = block of C and determined its eigenvalues and 
eigenvectors working up to Z max = 500 and with a 20° 
cut. Recall that the discussion above suggests that there 
are merely two enhanced eigenvectors: one corresponding 
to even parity and the other one to odd parity. The re- 
sults are shown in figure 3] and are clearly consistent with 
this picture. The enhanced eigenvectors are compared to 
the predictions based on the discussion above and show 
excellent agreement except in their low I components. 

If we are not dealing with a pure sky cut and the mask 
is neither azimuthally symmetric nor parity invariant, we 
cannot draw the same conclusions on the precise form of 
the enhanced eigenvectors. The enhanced directions can 
be any linear combination of the low I V\. It should be 
noted that in both cases, given we know the set of en- 
hanced directions and label the corresponding eigenval- 
ues and normalised eigenvectors A„ and b n , the covari- 
ance matrix C can be written as 



(68) 



where Pq is the orthogonal projector onto Im(Pc)- We 
then simply hav^l 



1 

An 



(69) 



Summing up, for the case of a power spectrum that 
is not constant but approximately decays like I /I 2 the 
masked covariance matrix is different from a simple or- 
thogonal projection operator by a number of enhanced 
eigenvectors. They are related to masked low I spheri- 
cal harmonics. In the case of a pure sky cut there are 
very few of these enhanced eigenvectors due to the high 
amount of symmetry of the mask. 



D. High-/ correlations from damping and beam 
smearing 

The true situation is more complicated than the sce- 
nario discussed above. While the power spectrum decays 
initially approximately like 1 /I 2 , eventually power will 
be exponentially suppressed by Silk damping. On top 
of that, CMB experiments generically have a finite (al- 
beit small in the case of high resolution experiments) 
beam width. Estimating the power spectrum from a 
map created from the measurements will not yield the 
correct theoretical power spectrum unless we take this 



The Moore- Penrose pseudoinverse of C must satisfy CC + = Pp. 
It can be seen that this is the case for the expression given above. 
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Figure 4. The eigenstructure of the m = block of C for a pure 20° sky cut and l maJi = 500. We see that among the 501 
eigenvalues there are a number of eigenvalues corresponding to fCer(Pc) restricted to the m — subspace. We made the claim 
that any non-zero eigenvalue must be > 1. However, we see that the transition from to 1 is not a simple step but there a a few 
eigenvalues between and 1. This is most likely due to the fact that P (and hence Pc) is not exactly a projection operator 
unless one works up to infinitely high i max as we discussed earlier. Thus, it is not inconsistent with the discussion above. 
There are indeed only two enhanced eigenvalues as we would expect: one corresponding to even parity (i.e. the corresponding 
eigenvector has only non- vanishing components with even I) and the other one to odd parity (i.e. the corresponding eigenvector 
has only non- vanishing components with odd I). The insets show the eigenvectors corresponding to the respective eigenvalues 
and compare them to the predictions vqo and Viq. As expected, they show excellent agreement except for very low I. 



beam smearing into account. For a perfectly azimuthally 
symmetric Gaussian beam, it can be shown that the 
convolved power spectrum C; is obtained from the true 



power spectrum C\ 



via Ci = tfCf 



where 



bi =cxp [-1(1+ l)y 



(70) 



and a = fwhm/ (2y2 In 2) with fwhm the full width at 
half maximum of the beam. We see that beam smear- 
ing leads to a significant additional exponential suppres- 
sion of power at high multipoles. Instead of the slight 
I /I 2 drop-off that we treated as locally constant above, 
the exponential decay introduced by Silk damping and 
beam smearing will eventually dominate the power spec- 
trum at high I and lead to big relative changes. As a 
result, we cannot draw the conclusion that only low Iq 
Vi get aligned with the kernel. Taking beam smearing 
into account the V[ with high Iq can now become strongly 
aligned with the kernel too. As we will see below, this 
might pose significant problems for PCL estimation. 



V. CHOOSING A NEAR-OPTIMAL BASIS 

Having discussed some features of masked covariance 
matrices, we will try to exploit the insight we obtained 



to make a choice of basis functions which gives an ap- 
proximation to the NRML estimator which improves over 
PCL estimation. Our strategy will be to start with a set 
of PCL basis functional. 



(o,n)(h, h) 



2*1 + 1 



(72) 



and complement it with additional basis functions Q(i )n ). 
As a guiding example we will consider the case of an am- 
plitude estimator, that is, we will estimate the amplitude 
of a given fiducial covariance in the data. We choose this 
somewhat simplified example to explain our method be- 
cause it makes the expressions very transparent but it 
should be noted that this case is directly relevant for 
parameter estimation. The overall amplitude of fluctua- 
tions is indeed part of the minimal set of parameters that 



10 Note that the normalisation (2/i + 1) in the denominator is not 
necessary and has no effect on the approximated estimator. How- 
ever to be more consistent with standard PCL methods we in- 
clude the normalisation so that the inner products of the basis 
functions with the data 

<3(0,n)(ll,[2)o tl Ot 2 =^>nm/(2n + 1) = <5„ (71) 

m 

are simply the standard pseudo power spectrum coefficients C n . 
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need to be estimated from the CMB data. 

We emphasise that including more parameters does not 
affect the applicability of our method. While we motivate 
our choice of basis functions with a simple example the 
same basis will be used in the general setting. 



A. Augmented basis beyond PCL 

An amplitude estimator e estimates the amplitude e 
of a covariance Co in the data so that the full fiducial 
covariance is C = eCo. We then simply have C e = Co 
so that, according to section IIII A[ the expression that 
needs to be expandable in the basis functions is 



(C,e)[i( 3 ,,,+% {r + ,. 1 _(^o)lil 

1 — — \yoih ti Ivo > h v 2 — — r 



G? C{ 



C 2 C$ C?,C}, 



(73) 



In fact the expression only needs to be expandable up 
to P-equivalence as was discussed in section IIII Al We 
remind the reader that the concept of _P-equivalence was 
introduced in section Hi CI 

In section ITVl we learnt that the normalised covariance 
Co given in (|55[) is essentially the orthogonal projection 
operator P^ defined in llV Cl augmented with a number of 
enhanced eigenvectors, that is, eigenvectors with eigen- 
values greater than unity. This yielded a simple expres- 
sion for Co given in (|68j) and a corresponding expression 
for Cj in (|69| . 

PCL basis functions are on their own perfectly able to 
expand the first part of Cq arising from Pq because of 
the following equivalence, 



C? C, 2 


c? 

tl 


1 

Mi— r 
C V 

'2 






h 




P 1 A 











(75) 
(76) 

In going from the first to the second line we made use 

1/2 

of the fact that (P) [/ [ 2 / C v ' is in the image of Pc and 

thus mapped to itself by the action of P c . Hence, we 
only need to worry about the enhanced directions. 

We saw that for a pure sky cut as well as for more gen- 
eral maskt03 the enhanced eigenvectors of the normalised 
covariance matrix Co are associated with certain 'masked 
directions' V\ defined in equation This motivates 



For a general mask we were still able to conclude that those 
enhanced eigenvectors associated with the initial 1/Z 2 power law 
drop-off should be linear combinations of low 1 D|. 



the inclusion of a second set of basis functions 

Q(i,0(li s la) = A-(«0t>0 fa "4 (77) 

c~l ci 

= -L(P) IlI (P) [(B J- (78) 

Recall that the index [ here represents pairs (l,m), so 
there are a large number of these basis functions. In- 
cluding all of them up to a certain I means adding an- 
other (/ + l) 2 basis functions. It is evident that this is 
only possible up to fairly low I values since otherwise 
the high number of basis functions renders the problem 
intractable just as was the case for the original NRML 
estimator. 

A more tractable option, which also gives rise to very 
accurate estimators, is to choose m-summed versions of 
these basis functions given by 



run 1 2 



l 



(79) 



We find strong evidence that these perform only marginally 
worse than (fTT|) for a 1/Z 2 drop-off of the power spec- 
trum, and lead to significant improvements for the full 
case when exponential suppression at high I is significant. 
Their huge advantage is that we can include all of them 
over the full I range while only doubling the number of 
basis functions, so that matrix inversion is still tractable. 

Furthermore, note that both the m-summed and the 
full set of new basis functions is well suited for numer- 
ical evaluation because of their separabilty and relation 
to the projection operator. This greatly reduces the nu- 
merical effort needed to evaluate the estimator and will 
be discussed in more detail below. 



B. Demonstrating the efficiency of the augmented 
estimator 



We proceed by verifying numerically that the new basis 
functions indeed improve the accuracy of the amplitude 
estimator to the extent that it becomes very nearly op- 
timal. Another key issue is to test whether or not it is 
sensible to use the m-summed set of functions instead of 
the full set to arrive at approximations to the NRML es- 
timator. To do this we used both choices, that is, the full 
and the m-summed set of basis functions combined with 
the usual PCL basis functions, to arrive at an approxi- 
mation e' to the amplitude estimator e on a sky masked 
with the WMAP KQ75 mask. A useful feature of the 
amplitude estimator is that one can obtain an accurate 
estimate of what the optimal limit of the variance pre- 
dicted by the Cramer-Rao bound should be. The ratio 
of the two is called the efficiency E of the estimator 



E(e') 



Var (e') 



(80) 
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Obviously the Cramer- Rao bound requires E < 1 and the 
larger E the better our approximated estimator uses the 
information in the data on the parameter of interest. For 
the true NRML estimator we should have E = 1, at least 
in the limit of large sample size. Now for an amplitude 
estimator, the Fisher information at e = 1 (i.e. for the 
case that Cq is the true covariance) simplifies to 



Tr 



F(e=l) 



(CoCn 



Tr [P] _ N_ 
2 2 



where Af = rank(Co) as in section Hi Bl The discussion of 
the PDF of multipole coefficients on an incomplete sky in 
appendix[A]suggests that the rank of a masked covariance 
matrix should be approximately given by (also compare 
results in [171 ]) 



■A/* ~ ,/*sky(^max ~1~ 1 ) 



(81) 



Hence as a benchmark for our amplitude estimator we 
estimate the efficiency at e = 1 to be 



E(e') = 



1/F(e) 



da T £— 1 da 



Var(e') / sky (Zmax + If 



(82) 



where a and £ are defined in section IIII1 To calculate 
the variance of the estimator numerically we generate re- 
alisation of the CMB and mask them subsequently with 
the WMAP KQ75 mask using HEALPix routines with 
A'sidc = 512. This corresponds to the pixel space reso- 
lution of the KQ75 mask and should allow for sufficient 
accuracy working up to ^ max = 700. As the fiducial power 
spectrum Cq.i giving rise to the masked fiducial covari- 
ance Co we use the concordance ACDM model. Note 
that we do not multiply by the beam function initially in 
order to check the arguments from the previous section 
for a simple l/l 2 drop-off without exponential suppres- 
sion at high I that causes additional complications. We 
calculate the vector (3 as given in section [TIT] for a given 
number of maps and then obtain da/de and £ via aver- 
aging over all maps 



da 

a7 



a 



(/3) 



t = -(Q3-a)(J3- 



of) 



(83) 
(84) 



Figure [5] is a plot of E(e') for approximations to the 
NRML estimator using the set of PCL basis functions and 
either the m-summed or the full set of additional basis 
functions up to a given Zi ow . We see that most of the 
gain in efficiency comes from including the lowest I basis 
functions as expected since these should correspond to 
the large enhanced eigenvectors for a l/( 2 drop-off. Fur- 
thermore, the plot shows that including the full, not m- 
summed, set of basis functions only leads to marginal im- 
provements. This is encouraging as it implies that good 
improvements might in general also be made with the 
m-summed set that remains tractable to arbitrarily high 



^low ■ 

Summing up the analysis of the simple amplitude es- 
timator for a power spectrum with l/l 2 drop-off indeed 
suggests that strong changes in the power spectrum, that 
occur mainly at low / for this kind of power spectrum, 
render PCL analysis suboptimal. For this simple l/l 2 
drop-off including additional basis functions of the type 
discussed above seems to recover nearly all of the infor- 
mation lost in a PCL analysis. 

As is discussed in IIVDI exponential suppression at 
high I due to beam smearing and Silk damping can cause 
aditional complications and information loss. Tests with 
the amplitude estimator for a power spectrum with beam 
suppression show that while there are significant im- 
provements in efficiency one does not reach the optimal 
limit quite as well with m-summed basis functions. We 
do not include the results here but rather discuss the 
estimator for a power spectrum with high I suppression 
from a more general point of view below in section I VII 



C. Summary of the general methodology 

After having introduced the general method of estima- 
tion in section Mil and having chosen a suitable set of 
basis functions above this is a sensible point to provide 
a rough summary of our approach for evaluating the it- 
erative estimator. Without any simplifications the steps 
involved are: 

1. Start with a fiducial guess for the parameters e a . 
It is sensible to use a fairly advanced guess that 
is reasonably close to the true parameter values to 
speed up convergence of the iterative estimator. 

2. Using these parameter values calculate the coeffi- 
cients cc, M 2 - and the covariance matrix of the basis 



£ defined in (|45|) and (|47|) respectively assuming the 
fiducial values of the e Q . The basis functions used 
are the joint set of the standard PCL basis 



Q(Uri)( [ l> h) 



<5nZi<5[il 2 

2h + l 



(85) 



and the new set of basis functions 

Q(l,n){hi h) = ^2 7^(-f)[i™(^)nml2 7r (86) 



3. Calculate (3 of the actual data. Note that, strictly 
speaking, this also assumes a choice of fiducial pa- 
rameters e Q in the definition of the new basis func- 
tions. 

4. Calculate Se a via equation (|43|). 

5. a) If the calculated adjustment to the fiducial pa- 
rameters is sufficiently small, i.e. the procedure 
is converged, then the updated parameter values 
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Figure 5. The efficiency E(e ) of approximations to the NRML ampiitude estimator based on PCL basis functions and either 
the full set of additional functions (dots) or the corresponding m-summed versions (circles) up to a given Zi ow . We obtained 
both a and £ used to calculate E from averaging over MC samples. This will lead to errors due to insufficient convergence. We 
plotted the curves for averages taken over 100k, 200k, 400k, and 800k MC samples. It is evident that due to the higher number 
of basis functions that are included in the full set, for low numbers of MC samples the corresponding E seems to diverge from 
the corresponding value for the m-summed set and in fact goes superoptimal. This is an artefact due to the finite number of 
samples that disappears as we increase the amount of MC runs included in the averages. For a sufficiently high number of MC 
samples the two curves are very close to each other and both approach E ~ 1 as one increases Zi ow . Evidently the biggest gains 
in efficiency are made by including the lowest I modes as is expected from the discussion in section HVCI for the case of a l/l 2 
drop-off of the power spectrum. 



are the final estimates. An estimate for the errors 
is given by T~l calculated from the quantities ob- 
tained in step 2 with the definition of T a p given in 
equation (pH)) . 

b) If the correction to the parameters is significant, 
the updated parameters are used as the input for 
step 2 for the next iteration. 

The biggest source of computational effort in this proce- 
dure is the calculation of a, J^- and £ for the chosen set 
of basis functions in step 2. 

Calculating j3 for a given map is simple: Obtaining the 
inner product with the PCL basis functions is extremely 
simple as it just amounts to evaluating the quantities 
a f an d' more importantly, the inner product with 
the new basis functions can be calculated rapidly due to 
their separable form and relation to the projection opera- 
tor. It does not require C(^ ax ) operations as one would 
naively expect for such an inner product. Effectively the 
inner products with the new set of basis functions can 
be obtained by dividing the multipole coefficients a, of 
the realisation by the power spectrum, generating a new 
map from these altered coefficients and calculating the 
PCL coefficients of these new maps. Analysing and gen- 
erating maps can be done rapidly with HEALPix using 
the built-in Fast Spherical Transform routines. 

Since we have a fast way of obtaining (3 for maps we can 
use MC sampling to calculate £ to the desired precision 
as described in section [TTTl 



Calculating a. and j^L a l so seems to be quite computa- 
tionally intensive. However, if we assume that our initial 
choice of parameters is not too far off the actual parame- 
ters it is justified to simply use the corresponding power 
spectrum for the construction of the new basis functions 
and keep the basis the same throughout the iterative pro- 
cedure. Most of the work needed to calculate a and M^- 
in each step can then be done as a one-off calculation at 
the start of the procedure. This is because we can write 

(«)(i,n) = (M.) ihn} iCi (87) 

with 

(M)(i,n)l = Q(i,n)(h, ^2(P)hlm(P)lml 2 (88) 

m 

which is independent of e given we keep the basis function 
the same. Calculating Ad can be cumbersome but only 
needs to be done once. The calculation of a and -f 2 - for 
each iteration step reduces to the computation of Ci(e a ) 
and which is much simpler. 

Note that there is a close relation of Ad to the ma- 
trix M relating the true power spectrum coefficients Ci 
to the expectation values of the pseudo power spectrum 
estimators C/ via (|34|) within the framework of PCL es- 
timation. In fact M is simply a submatrix of Ad and we 
have 

(M) hh = (M) {0Mh (89) 
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Finally one can avoid the many MC runs involved in 
the computation of £ in each step if the fiducial param- 
eters are already close to the true parameters. If this is 
the case one can adopt the point of view that the updates 
change £ only little throughout the iterative procedure 
and hence it is justified to simply stick to the fiducial £ 
without altering the final parameter estimates consider- 
ably. 

Taken together these measure make the above proce- 
dure very efficient except for a one-off calculation at the 
start of the iterative process. 



VI. VALIDATION: COVARIANCE MATRICES 
OF APPROXIMATED NRML ESTIMATORS 

So far we have only shown how the new basis func- 
tions improve the accuracy of the amplitude estimator 
as an easy example. As a benchmark for the quality of 
the approximation we used a single number, namely the 
efficiency of the estimator. To compare multi-parameter 
estimators that are used in a realistic analysis we need 
the full covariance matrices of the parameter estimates. 

Suppose we want to estimate a set of parameters e a 
with an approximated NRML estimator. The variance of 
the estimates is then approximately given by If we 
want to compare two approximations a, b with covariance 
matrices T~p(a), then the natural criterion is that 

we call the estimator a better than & if as a matrix equa- 
tion I?-* (a) < J?-* (6) in the sense that (b) - (a) is 
positive definite and vice versa. Note that the Cramer- 
Rao bound requires > so there is a limit on 
how good an estimator can be. (a) < (b) is iden- 
tical to the requirement T a p(a) > T a p(b). Now instead 
of comparing covariance matrices for a large number of 
conceivable sets of cosmological parameters we can make 
use of the fact that the covariance depends on the pa- 
rameters only through the power spectrum coefficients, 
that is, we can use the chain rule to write [l3l 



1 da ! da 

1 a/3 — ttt; s "5 

2 oe a oep 

1 dCi ± da x da dCi : 
= 2 de a dC u € dC u de 



dc h i dc± 

de a 1 2 dep 



(90) 
(91) 
(92) 



Hence we clearly have r~^(a) < T~^(b) for all conceiv- 
able parametrisations of the power spectrum if and only 
if r hh (a) > T hh (b), i.e. r-l 2 (a) < T-] 2 (b). For this 

reason it is sufficient to focus on the Ci covariances F, - } 
and compare these for different approximations to the 
NRML estimator. 

Figure [6] depicts the covariance matrices for dif- 
ferent approximations of the NRML estimator for the 
WMAP power spectrum in the absence of the WMAP 
beam and without the inclusion of noise. The upper plot 



shows the diagonal of T ; \ divided by cosmic variance 
2/(21 + 1) for (i) a PCL basis, (ii) an augmented basis 
([5B"f with the first 100 m-summed basis functions and 
(iii) with the first 1000 augmented basis functions. One 
can see that in the case of a pure I /I 2 drop-off most 
gains are made at low I where a strong drop-off occurs. 
The inclusion of the higher I basis functions only leads to 
marginal improvements. The density plots show 20 x 1000 
submatrices of the full covariance matrices for the differ- 
ent sets of basis functions, scaled such that the diagonal 
entries are exactly unity, i.e. the plot actually shows 
r i ~ i2 /(F i ~ ; 1 i F ; ~^)2 . This makes it easier to plot the off- 
diagonal elements that are typically much smaller than 
the diagonal. The topmost density plot shows the covari- 
ance structure for the simple PCL set with off-diagonal 
low ^-high I coupling clearly visible. The plot below shows 
the covariance of an estimator based on PCL functions 
and m-summed basis functions up to l\ ow = 100. The 
plot illustrates how the correlated errors have been re- 
moved by including the new observables. 

The improvements from the augmented basis functions 
()86[) are similar to what the usual hybrid estimator is 
meant to achieve. Instead of combining PCL estimates 
with further observables constructed from the same map, 
the hybrid estimator combines them with ML estimates 
for the low I multipoles obtained from smoothed maps. 
Whether or not this eliminates the correlated errors in 
the same way remains an open questional. The bottom- 
most covariance matrix corresponds to the inclusion of 
the full set of m-summed basis functions. It looks very 
similar to the one above as is expected. 

Figure [7] is similar to figure [51 but now the covariance 
matrices Fjj are based on the WMAP power spectrum 
multiplied by the WMAP beam. Again the upper plot 
shows the diagonal of F^] 2 divided by cosmic variance 
2/ (21 + 1) for the various sets of basis functions. Just as 
in the pure l/l 2 case, one can see that improvements over 
PCL can be made at low I. However on top of that we 
see that the variance of PCL estimates blows up at high 
I where the power spectrum is exponentially suppressed 
due to the beam. This is a new feature. It is evident 
from the density plots in figure [7J that significant im- 
provements can be made by including the new high I 
m-summed basis functions. The leftmost plot shows the 
covariance matrix resulting from just using PCL basis 
functions. As in figure [6j one can see the blue edges cor- 
responding to the low ^-high I coupling introduced by the 
strong decay of the power spectrum at low However, 
now there is also significant off-diagonal structure at high 



12 Reference [l4j did not obtain covariance matrices for the hybrid 
estimator from MC simulations but instead assumed that the ML 
procedure returns the exact full sky Ci estimators and calculated 
the covariance of the hybrid estimator based on this assumption. 
This is not necessarily a good assumption because the low I mul- 
tipoles cannot be exactly recovered for fluctuations that are not 
band-limited on an incomplete sky. 
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Figure 6. Covariances r, ii2 for PCL basis functions ([85} and then augmented with the m-summed basis functions (|86|) up to 
'low = 100, 1000. The upper plot shows the diagonal of TJ'] divided by cosmic variance 2/(21 + 1). The density plots below 
show 20 x 1000 submatrices of the full covariance matrices scaled such that the diagonal entries are exactly unity. The colour 
scale goes from red (everything above 0.01) to white (0) to blue (everything below -0.01). These three plots visualise the 
off-diagonal structure introduced by the low i-high I coupling in the PCL analysis (top) and then they show how it vanishes as 
one includes the m-summed basis functions (|86[) up to 100 (middle) and 1000 (bottom). 



/. Next to it in figure [7] is a density plot of the covariance 
structure for an estimator based on PCL functions plus 
the m-summed basis functions up to Zi ow = 100. The 
inclusion of the low I basis functions eliminates the low 
/-high / coupling and also has a beneficial effect on the 
high I correlations on and near the diagonal. Including 
all m-summed basis functions removes most of the re- 
maining off-diagonal structure at high I, as is shown in 
the density plot on the right, contrasting starkly with the 
PCL analysis. 

The numerical results in this section where derived us- 
ing a HEALPix grid with 7V s id G = 512 which is typically 
recommended for an analysis up to / max ~ 1000. How- 
ever, it is clear that when analysing to high multipoles 
the finite pixelisation starts to play a significant role. In 
particular one might become concerned that the increase 
in the variance in the case of beam suppression could be 
a result of analysis at finite pixel space resolution. This 
issue is addressed in appendix [D] where we note that it is 
in fact prudent to use pixelizations with N s ide = imax to 



achieve high accuracy. However, although the finite reso- 
lution has a small effect on the variance at high I it is not 
primarily responsible for the dramatic increase observed 
in figure [7] 



VII. SUMMARY AND OUTLOOK 

We have studied the performance of PCL-based CMB 
analysis in the absence of instrumental noise and we have 
addressed the question whether PCL estimation can re- 
cover information in the data in a nearly optimal fashion, 
as has been claimed in the literature. For a power spec- 
trum that exhibits little change over the entire I range, 
we concluded that PCL estimation does approach opti- 
mality. However, this is not the case when large relative 
changes occur. Realistic power spectra, i.e. power spec- 
tra that are measured by high resolution CMB experi- 
ments, are far from constant. The concordance ACDM 
model initially decays like l/l 2 and thus exhibits large 
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Figure 7. Covariances r iii2 for PCL basis functions and extended basis up to Zi ow = 100, 1000. The upper plot shows the 
diagonal of T ; ~J 2 divided by cosmic variance 2/(2Z + 1). The density Plots show the full covariance matrices scaled such that 
the diagonal entries are exactly unity. The Color scale goes from red (everything above 0.05) to white (0) to blue (everything 
below -0.05). 



relative changes at low I. While the relative changes 
at intermediate / values are more modest, Silk damping 
and experiment-specific beam smearing eventually lead 
to an exponential suppression of power which gives rise 
to a strong decay of the spectrum at high I. We have 
shown that the decay of the power spectrum leads to the 
emergence of large eigenvalues in the covariance matrix. 
Intuitively one can think of these eigenvectors in the co- 
variance matrix as resulting from mode mixing due to 
the mask. A particular harmonic on the full sky con- 
tributes to measurements of other multipole coefficients 
on the incomplete sky due to the coupling of modes by 
the mask. If the power spectrum decays quickly, the spu- 
rious contribution to the measured signal from modes 



with smaller I is relatively large compared to the signal 
at that I value and the pollution is fairly severe. The 
full NRML estimator discussed above makes use of in- 
verse covariance weighting to simultaneously disentangle 
the couplings and down- weight the polluted modes in the 
data and hence gives rise to the lowest possible variance. 
For a constant power spectrum no down- weighting is nec- 
essary and PCL estimation is optimal, but this is not 
the case in general and PCL basis functions cannot ade- 
quately account for this coupling in the data and so they 
do not achieve optimality. 

The information loss in the framework of PCL estima- 
tion due to a non-constant power spectrum is clearly a 
phenomenon of practical concern. The important ques- 
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tion is whether or not this information loss is small 
enough to be ignored. If it can be neglected, then PCL 
analysis is a very attractive method due to its speed and 
robustness. The information loss due to the l// 2 drop-off 
alone (i.e. without further suppression at high I due to a 
beam function) appears mainly to affect the variance of 
low / modes and leads to some low /-high I correlation. 
One could argue that this is not very important because 
there is not much information in the low I modes. How- 
ever correlated errors can be important if not taken into 
account properly. As we have seen for a WMAP mask 
with an amplitude estimator, this effect leads to a 20% 
increase in the variance due to these correlations of the 
error bars. 

Strong decay in the power spectrum at high / intro- 
duced by accounting for beam smearing leads to poten- 
tially very significant effects at high I. For WMAP these 
effects set in after the measurements become noise domi- 
nated, so it is unlikely that this suboptimality has a large 
impact on the analysis. Here, the noise at high / has al- 
ready erased most of the information on parameters so 
there is little to gain. This would change if the noise con- 
tribution to the measurements was smaller over a wider 
multipole range (for the given mask) . Planck provides an 
interesting example in which a significantly smaller noise 
contribution is expected. If the suppression of the power 
spectrum due to Silk damping and beam smearing sets in 
earlier than the point at which the signal becomes noise 
dominated, then the information loss which we tackle in 
this paper might very well become relevant at high /. 

The main development in this work has been to present 
a general modal estimator framework in which we can 
systematically construct approximations to the NRML 
estimator based on arbitrary sets of basis functions. We 
have shown how PCL estimation can be incorporated 
into this framework but, more importantly, how it can 
be augmented with a small set of projected basis func- 
tions (|86p chosen to reflect the covariance structure of 
masked multipole coefficients. Including just a few of 
these new observables can systematically remove the low 
/-high / correlations due to a realistic mask. As we have 
discussed, this is a well known deficiency of PCL anal- 
ysis for which there are some alternative prescriptions 
used to ameliorate the problem, such as hybrid methods 
and apodization (though arguably more ad hoc than the 
present work) . Perhaps more novel is the fact that includ- 
ing the full / max of our m-summed basis functions also ef- 
fectively and efficiently removes high /-high/ correlations 
in the covariance matrix. We emphasise that these cor- 
relations may prove significant when Silk damping and 
beam effects come into play at high / before instrument 
noise dominates. 

There are a number of open questions and related prob- 
lems that need to be investigated in future work. First, 
we have done preliminary investigations on the effect of 
the inclusion of noise in the analysis, but this needs to be 
more thoroughly studied. While this mainly of interest 
in a Planck setting because of the potential relevance of 



the high / effects discussed above, many basic tests can 
be carried out within a WMAP framework. Furthermore 
a full iterative estimator based on the new m-summed 
modes introduced in this work needs to be implemented 
to demonstrate feasibility and investigate the potential 
benefits more thoroughly. In particular so far we have 
ignored the effect of the choice of modes on the endpoint 
of the iterative procedure and simply focused on our es- 
timate for the variance of the estimator as a measure 
of accuracy. For example using different basis functions 
might also influence the bias of the estimator. While it 
seems reasonable to believe that a set of basis functions 
that performs better according to our estimate of the re- 
sulting variance should generally lead to a more accurate 
localisation of the ML peak, more work has to be done 
to verify this. 

The method also has potential applications within the 
analysis of higher order correlation functions. In partic- 
ular the search for non-Gaussianity in the CMB fluctua- 
tions focuses on the bispectrum, i.e. the 3-point function. 
Again the optimal estimator requires inverse covariance 
weighting with a covariance matrix that is highly compli- 
cated due to masking of the sky and other experimental 
effects. Approximate methods related to PCL estimation 
are usually used to evaluate the estimator. Unfortunately 
this increases the error bars and hence makes it harder to 
detect statistically significant signals of non-Gaussianity. 
It might be possible to achieve nearly optimal results 
by including basis functions very similar to those intro- 
duced in this work but we leave this application for a 
future publication. Of course, the fundamental ideas in- 
troduced here to describe the effect of masking will also 
be relevant for observational data in other contexts, such 
as for the obscuration that occurs in three-dimensional 
galaxy surveys. 
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Appendix A: PDF of multipole coefficients on a 
masked sky 

Masking of certain regions of the sky leads to a sin- 
gular multipole covariance matrix. As stated above the 
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PDF of multipole coefficients restricted to the subspace 
Tm(C) can be written in terms of the unique Moorc- 
Penrose pseudoinverse of the masked covariance matrix 
C+. 

To understand this in more detail and obtain an intu- 
ition let us analyse the situation. Suppose the covariance 
matrix has a kernel /Cer(C). As the covariance matrix 
is symmetric /Cer(C) is orthogonal to Tm(C). Realisa- 
tions of the CMB on a masked sky a — (<Z[) must be 
fully contained in Xm(C). Hence, if we introduce a set 
of JV = rank(C) vectors {/„} that form an orthonor- 
mal basis of Tm(C) any realisation can be expanded as 
a = A n I n with A n = I n ■ a. For a more compact nota- 
tion denote the (7 ma x + l) 2 X N matrix that has the I n 
as colums by I. We have 



The normalisation factor is given by 

N = j d^Aexp (~^a T C + a 



J d^Aexp (~A T C^ 1 A 



(2tt)^ det (C A ) 



= ,/(2^det*(C) 



(A8) 



Where det* is the pseudo-determinant (i.e. the product 
of all non-zero eigenvalues) . As required N is like V as a 
whole independent of our choice of orthonormal basis in 
Tm(C) that was merely an auxiliary construction. 



(AA )=rCI := C f 



(Al) 



where C a is invertible. We are now in a position to write 
down the PDF in multipole space for the masked caseF^l 



V(A\C) 



(A2) 



exp 



Ia t c~/a 



N 



Now it is true that 



(ic-/i t ) C 



C (/C^ 1 J T 
II T := P 



where P is the orthogonal projection operator 
Im(C). This implies that IC~^I is the unique rV 
Penrose pseudoinverse0 of C 



a 



(A3) 
(A4) 
(A5) 
onto 

unique Moore- 



C+ = ICaI t 



We arrive at the final result 



P(A|C) = -exp 



-a 2 C + a 



(A6) 



(A7) 



A remaining issue is how to determine the dimension of 
the image J\f. This does not seem to be an easy task in 
general as the details of the shape of the mask will have 
to be taken into account. If the Y R were a complete set 
of basis functions on the set of N p i x pixels and we leave 
pixels unmasked, we would have the exact 



a set of iV™ m 
equality 



(In 



1) 



pix 

AW 



/sky 



(A9) 



where, for sufficiently dense pixelisation, / s k y can be 
thought of as the fraction of the celestial sphere that is 
not covered by the mask. This is clear since then we 
could think of the kernel of C as simply the space of fluc- 
tuations that only have support in the masked region. In 
the realistic case where (Z max + l) 2 < N p i x and the Y t R 
do not form a complete set this is not necessarily true. 
For reasonably high iV pix and Z max it seems likely that a 
sensible estimate is given by 



AA = / sky (/ max + l) 2 



(A10) 



This is also confirmed by results in [17] where amongst 
other things coupling matrices (P)[ 1 [ 2 = J (PnY^Y^U 
and the dimension of their kernels arc studied for generic 
mask functions U(n). 



Appendix B: Constructing the NRML estimator 



Note that this is now not a density in the full space of har- 
monic coefficients a anymore but rather in the smaller subspace 
spanned by the I n . A density in the full space does not exist for 
a singular covariance matrix as the Gaussian distribution is only 
supported on a subset of measure zero. 

The Moore-Penrose pseudoinverse of a matrix M is defined 
to be the unique matrix M + satsifying 1) MM+M = M, 
2) M+MM+ = M+, 3) (MM+y = MM+ and 4) 
(M+M^ = M+M. 



Here we briefly review the construction of the NRML 
estimator. Most of the details can also be found in [12|. If 
we use the Newton-Raphson method to find the zero x of 
a function f(x), i.e. we want to solve f(x) = 0, we start 
with a first guess xq and calculate the next approximation 
to x, via 

r f( x n) /-D1\ 

X n +1 - X n = O n+1 = -— (Bl) 

J ( x n) 

Given that the procedure will not converge to some other 
zero this will iterate towards x. Now if we straightfor- 
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wardly generalise this procedure to a system of equations 
and apply it to solve the system 



dlogV 
de a 



= 



we arrive at the iterative process 

_ x dlogV 



Se, 



T 



a/3 



dtp 



where 



■7"a/3 — — 



d 2 logV 
de a dep 



(B2) 



(B3) 



(B4) 



Explicitly we obtain 

dlogV = l d(C) hl2 
dtp 2 dep 

and 



(C+) hl[ (C + ) l2 v 2 (a Vi av 2 - (C) l[V2 ) 

(B5) 



J~aj3 — J^a + 8F a p 



(B6) 



where 



l d(C) h t . . d(C) vv 

^ - 2—d^r {c )iii i (c ^-^r - (B7) 

and in more compact notation 

SF a[5 = Tr[ (aa T - C) {CT^^C^C^CT 1 
--C~ 1 C tCt pC~ 1 )] 

Now if we assume that C is calculated from parameters 
that are fairly close to the true paramters then we obvi- 
ously have (SJ^afj) rs 0. So 8T a p is given by a distribution 
around zero. Fortunately we expect that for a realis- 
tic situation this dstribution is sharply (compared to the 
magnitude of J^a) peaked about its mean zero. For ex- 
ample for the estimation of a single parameter e that sim- 
ply changes the amplitude of the covariance, C = eCo, 
we have C 6 = Co and C tt = so that we can calculate 



25 rank (C ) 2 



(B8) 



But it is easy to see that J-" e ° e = rank (Co) /2 and for a 
typical high resolution CMB experiment rank (Co) is of 
order 10 6 . This means we expect 8T a p to be a correction 
of order 10~ 3 to J-® e . Hence in practice we can simply 
assume 



v 

J a 







{Fa?) — F a j3 



(B9) 



The final iterative procedure is thus 
8e n = IjTi^?^(C + V l (C + ) [2l , {a Vi a { , - {C) VA ) 



■Fop 



2 af} dep 
1 9(C) hl2 + + 



dc 







Although this is, precisely speaking, only an approxima- 
tion to the Newton- Raphson method for finding the max- 
imum of the likelihood, we refer to this as the Newton- 
Raphson ML estimator (NRML) estimator. This is con- 
sistent with the nomencalure in reference (T3 |. We have 
argued that this is in fact a very good approximation 
to the actual NRML estimator. However we have seen 
that the full NRML estimator is not simply quadratic in 



the data like the approximated version. Reference [12 1 
therefore calls it the quadratic estimator to emphasise 
this aspect. We chose not to do this to avoid confusion 
as, for example, the PCL estimator or any other approx- 
imations to the NRML estimator discussed in this paper 
are quadratic in the data. 

As is pointed out in [l2j this approximation should have 
no effect on the endpoint. Both methods should converge 
to the ML value of the parameters. So at most this will 
affect the speed of convergence. 



Appendix C: Equivalence of weighting factors 



We want to understand why it is legitimate to replace 



(CI) 



as a weighting in the estimator based on normalised mul- 
tipole coefficients. The replacement is not a strict matrix 
identity because the covariance matrix is not invertiblc. 
An easy way to see this is noting that the two sides do 
not have the same kernel. 

Let us study a slightly more general setting and introduce 
a set of 'normalised' multipole coefficients a = A ~ a con- 
structed from the original multipole coefficients a. A is 
some invertible matrix (in the case of interest just pick 

(A) ht2 = C^6 hl2 ). 

We are going to show that as a matrix identity 



C+ = P(A-y (A- 1 C(A- 1 ) T ) + A X P 



(C2) 



where P is the orthogonal projection operator onto 
lm(C). Note that for one thing both sides have the 
same kernel given by the vector space {(I — P)v,v E 
jjCmax+i)"} anc [ the same image given by {Pv,v E 

K C. n ax + l) 2 }. 



Proof: 

Let Pa be the orthogonal projection operator onto the 
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subspace {A 1 Pv,v e }. \\/g mus t have 

A X P = P A A l P. Using 

P = CC+ 
P A = (A- 1 C(A- 1 f) (A- 1 C(A- 1 ) T ) + 

we arrive at 

A - 1 CC + = A l P = P A A X P 
= {A- 1 C{A- l ) T ) (A- 1 C(A- 1 ) T ) + A l P 

acting on both sides with C + A gives 

(C + A)A- 1 CC + = C + 
= (C+A) {A- l C{A- l ) T ) (A- 1 C(A- 1 ) T ) + A 'P 
= P{A- l ) T (A- 1 C(A- 1 ) T ) + A^P 
Hence we have shown 

C+ =P(A- 1 ) T (A- 1 C(A- 1 ) T ) + A- 1 P (C3) 
as intended. 

Since P acts like the identity on (unnormalised) multi- 
pole coefficients this implies that we can replace 

A T C + A^> (A^CiA- 1 )^ (C4) 

as a weighting in the normalised estimator. 

i 

In our special case (A)^ = C^S^ t 2 this gives 

ci(c+) hli c^e + (C5) 

exactly as we wanted to show. 



Appendix D: Effect of the finite pixel space 
resolution 



We generally assume that the finite pixelisation of the 
sphere with a HEALPix grid does not have an effect 
on the estimates as long as one chooses a sufficiently 
high resolution. For example when calculating the ma- 
trix Mi 1 i 2 for the PCL estimator we implicitly assume 
that the resolution is infinite but the PCL observables 
are calculated from multipole coefficients obtained from 
pixelised maps that might not be related to the power 
spectrum in the same way. If we analyse a HEALPix 
map and determine its harmonic coefficients we have to 
keep in mind that these are not the actual harmonic co- 
efficients of the full map but rather approximations to its 
true multipole coefficients based on its pixelised version. 
The numerical results were derived using a HEALPix grid 
with Aside = 512 analysing the maps up to l meix = 1000. 
This should provide good accuracy. However when study- 
ing high I effects like the increase in the variance of the 
estimator when the suppression of power due to the beam 
is taken into account it is important to make sure that 
these effects are not simply due to the fact that we are not 
in the infinite resolution limit. Figure [8] shows the vari- 
ance of the PCL estimator for the WMAP power spec- 
trum suppressed by the beam as calculated from 100k 
MC runs for A sidc = 512, 1024, 2048. We see that while 
there are minor changes in the variance at high I go- 
ing from A s ide = 512 to A s id c = 1024 the qualitative 
behaviour is the same. Increasing A s ;dc to 2048 hardly 
makes a difference which suggests that this is already 
clearly in the infinite resolution limit for an analysis up 
to Z max = 1000. We conclude that for high precision es- 
timation it would be prudent to use pixelizations with 

-^side — ^max* 
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Figure 8. Plot of the variance of the PCL estimator for different values of the HEALPix resolution parameter iV s id c = 
512, 1024, 2048. While there are minor changes going from N B idc = 512 to 1024 increasing the resolution to 2048 hardly leads to 
any changes. This suggests that one has approached the infinite resolution limit for the purposes of an analysis up to multipoles 
with Z max = 1000. 
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